Statistics in Medicine
Top medRxiv preprints most likely to be published in this journal, ranked by match strength.
Show abstract
Hybrid controlled trials (HCTs) incorporate real-world data into randomized controlled trials (RCTs) by augmenting the internal control arm with patients receiving the same treatment in routine care. Beyond increasing power, HCTs may improve recruitment by supporting unequal randomization ratios that increase patient access to experimental treatments. However, HCT validity is threatened by bias from unmeasured confounding due to lack of randomization of external controls, leading to outcome non-...
Show abstract
Cox proportional hazard regressions are frequently employed to develop prognostic models for time-to-event data, considering both patient-specific and disease-specific characteristics. In high-dimensional clinical modeling, these biological features can exhibit high collinearity due to inter-feature relationships, potentially causing instability and numerical issues during estimation without regularization. For rare diseases such as acute myeloid leukemia (AML), the sparsity and scarcity of data...
Show abstract
Mendelian randomization is currently mainly implemented through the use of genetic variants as instrumental variables to investigate the causal effect of an exposure on an outcome of interest. Mendelian randomization studies are robust to confounding bias and reverse causation, but they remain susceptible to selection bias; for example, this can happen if the exposure or outcome are associated with selection into the study sample. Negative controls are sometimes used to detect biases (typically ...
Show abstract
The doubly-ranked non-linear Mendelian randomization method can yield biased estimates when instrument strength varies across individuals due to gene-environment (GxE) interactions. We propose a simple strategy to mitigate this bias by modelling GxE interactions and removing the fitted GxE component from the exposure before stratification by the doubly-ranked method. In simulations, the proposed GxE correction strategy eliminated GxE-induced bias with null, linear and non-linear exposure-outcome...
Show abstract
Epigenetic clocks estimate biological age from DNA methylation patterns at CpG sites, providing robust predictions of mortality and morbidity risk. "Blue zones"--regions of exceptional longevity--offer a unique opportunity to investigate how biological aging diverges from chronological age. However, standard clocks are typically trained on large, heterogeneous datasets, reflecting average population trends rather than region-specific dynamics. Using data from the Costa Rican Longevity and Health...
Show abstract
Virtual clinical trials (VCTs) hold significant promise for improving the drug development process, yet their predictive reliability depends critically on design decisions that remain poorly understood. This study examines how model complexity influences VCT outcomes, as well as how the choice of prior parameter distributions and virtual patient inclusion criteria affects those outcomes. Using oncolytic virotherapy treatment of murine tumors as a case study, we compared three mathematical models...
Show abstract
Reliable pediatric virtual patients are essential for model-informed simulations, including physiologically based pharmacokinetic (PBPK) modeling, to support dose selections in children and to evaluate drug exposure across developmental stages. Despite the availability of extensive pediatric physiological data and age- or size-based models, there remains a lack of well-established, flexible, and scalable approaches for integrating these data into realistic pediatric virtual patients that preserv...
Show abstract
BackgroundVaccines can prevent severe disease by preventing infection or by reducing progression among those who become infected. Vaccine effectiveness against progression given infection is often used to quantify this second mechanism, but it conditions on infection, which is itself affected by vaccination. As a result, this estimand lacks a clear causal interpretation and may behave non-intuitively over time. MethodsWe introduce a conceptual framework that models protection against infection ...
Show abstract
In many countries, demand exceeds supply for elective (non-emergency) hospital treatment, such as hip replacements and cataract removals. The consequence of this is the formation of a waiting list, to which patients join on referral from the family doctor and leave with treatment or renege for other reasons (deconditioning, seeking private healthcare, etc). Adequate performance is commonly incentivised through the imposition of targets on waiting times. In the first study to do so, we develop a...
Show abstract
BackgroundElectronic health record (EHR)-based prognostic modeling is increasingly used in oncology, yet incorporating pharmacogenomic (PGx) knowledge derived from experimental systems into clinical prediction frameworks remains challenging. This gap is driven by fundamental mismatches between controlled drug-mutation assays and heterogeneous, incomplete real-world clinical data. MethodsWe propose a representation transfer framework that integrates PGx embeddings learned from large-scale in vit...
Show abstract
Clinical deployment of foundation models requires decision policies that operate under explicit error budgets, such as a cap on false-positive clinical calls. Strong average accuracy alone does not guarantee safety: errors can concentrate among patients selected for action, leading to harm and inefficient use of healthcare resources. Here we introduce SO_SCPLOWTRATC_SCPLOWCP, a stratified conformal framework that turns foundation model predictions into decision-ready outputs through error-contro...
Show abstract
Polygenic risk scores (PRSs) quantify an individuals genetic susceptibility to complex traits and diseases. Conventional PRSs, which are based on linear models, perform poorly for phenotypes with skewed distributions or with genetic effects that vary across the distribution. We propose a quantile regression-based PRS (QPRS) that can capture quantile-specific genetic effects. While existing PRSs provide only a single score, QPRS models genetic influences at multiple quantiles of the phenotype, th...
Show abstract
Advanced spatially resolved transcriptomic (SRT) technologies preserve the spatial context of gene expression within tissues, enabling the study of context-dependent transcriptional regulation. Here, we propose VISGP, a variational sparse gaussian-process method for spatial variable genes (SVGs) and cellular interactions analysis from such data. VISGP utilizes variational inference and a sparse Gaussian process approximation, which efficiently models the posterior distribution with a set of indu...
Show abstract
BackgroundHealthcare utilization forecasting systems are often derived from static, annualized market share assumptions that fail to represent real-world treatment dynamics. Such approaches systematically misestimate future utilization by ignoring longitudinal treatment sequencing, discontinuation with surveillance, recurrence-driven re-entry, and provider adoption dynamics. ObjectiveThis study proposes a reusable, governance-driven health informatics forecasting framework designed to generate ...
Show abstract
BackgroundMany patients with triple-negative breast cancer (TNBC), particularly those who are older, Black, or insured by Medicaid, do not receive guideline-concordant treatment, despite its association with up to 4x higher survival. Early identification of patients at risk for rapid relapse may enable timely interventions and improve outcomes. This study applies machine learning (ML) to real-world data to predict risk of rapid relapse in TNBC. MethodsWe trained various ML models (logistic regr...
Show abstract
BackgroundMultiplex bead assays (MBAs) provide quantitative measurements of many analytes from small sample volumes, reducing cost and processing time compared with traditional immunoassays. These advantages have made MBAs valuable for studying diverse diseases, particularly in low-resource settings. However, most analytical approaches focus on individual diseases, while integrated surveillance platforms would benefit from methods that jointly analyse the full range of pathogens included in mult...
Show abstract
Drug repurposing offers the opportunity to identify promising drug targets efficiently using existing data, but there are currently limitations to these efforts; there is a particular need for versatile, but rigorous high-throughput approaches. As such, we developed a flexible, high-throughput, Mendelian randomization (MR)-based drug repurposing pipeline with three stages: 1) MR-based identification, 2) MR-based validation and prioritization, and 3) application. This pipeline can be applied to a...
Show abstract
BackgroundTyphoid fever incidence estimates are central to policy decisions on vaccine introduction and investments in non-vaccine prevention and control but are often unavailable. We explored whether prevalence metrics from sentinel studies of community-onset bloodstream infections could accurately predict local Salmonella Typhi (S. Typhi) incidence. MethodsUsing a previous systematic review (January 2018-December 2024), we identified studies reporting both typhoid incidence and prevalence of ...
Show abstract
IntroductionGenome-wide association studies (GWAS) have identified hundreds of variants linked to cancers, but their downstream regulatory consequences remain poorly understood. Increasing evidence suggests that related cancers share alterations of common regulatory programs. Trans-associations of cancer risk variants mediated via molecular phenotypes, such as gene expression and protein levels, can help uncover these downstream mechanisms. Further investigation of such convergence can reveal sh...
Show abstract
A substantial proportion of recovered deceased-donor (DD) kidneys go unused. Accumulated refusals by transplant centers during the offer process may signal nonuse risk, and quantifying this phenomenon could inform frameworks for rescue strategies or out-of-sequence (OOS) placement. Using OPTN data on adult DD kidneys offered for transplant in 2024, we empirically estimated the probability of nonuse as a function of accumulated refusal count (ARC). Kidneys transplanted OOS were excluded from anal...